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USER INPUT DEVICE AND METHOD FOR INTERACTION WITH GRAPHIC 

IMAGES 

FIELD OF THE INVENTION 

The present invention relates generally to user input devices and methods for 
effecting movement of an object on a graphical display, and more specifically to an input 
device and method wherein video images of the user are captured and processed to 
provide a signal effecting translation^ and/or rotational movements of an object on a 
graphical display. In particular, this invention is appUcable to graphical entertainment 
systems such as video games. 

BACKGROUND OF THE INVENTION 

Systems, methods, input and input devices employing video images are utilized for 
effecting the movement of an object on a graphical display such as a video monitor. 
Frequently, such video input devices are responsive to the movement or position of a user in 
the field of view of a video capture device. More recently, video image processing has been 
used to translate the movement of the user that has been captured as a sequence of video 
images into signals for game control. Prior art input systems include a video capture device 
that scans a field of view in which a system user stands. The captured video image is applied 
to a video digitizer that provides digital output to a processor that analyzes and processes the 
digital information received from the digitizer and, based upon the position or movement of 
the participant in the field of view, the processor produces signals that are used by the graphic 
generating system to move objects on the display. Although the operation or output of the 
devices or graphical displays can thereby be effected by the position or movement of the 



» participant, the computer processing time required is frequently very extensive and complex, 

tending to require substantial computer and/or time resources. 

In addition, known devices and methods employing user video image data that are 
used to effect the movement of an object on a graphical display are typically characterized by 
5 significant encumbrances upon the participant w^ithin the video camera field of view. Such 
systems may include additional equipment that the participant is required to wear, such as arm 
coverings or gloves with integral, more easily detectable portions or colors, and/or visible light 
sources such as light emitting diodes. However, such systems do not allow for the ease-of- 
use, quick response, and simplicity needed to provide a user input device capable of meeting 
1 0 marketability requirements for consumer items such as might be required of video game 
controllers. 

Furthermore, known systems include additional analysis of the video images so as to 
understand or recognize the movement that is occurring, such as e.g., comparison to pre- 
existing marks, which adds to system complexity and response time, making them impractical 

] 5 for widespread use. 

Moreover, although known systems may require the input video image processors to 
recognize and determine a significantly large number of segments, boundaries^ and/or boxes in 
order to produce the output signals for graphical display control purposes, these systems do 
not allow for the calculation of an array of control signals based upon a minimal initial 

20 determination of limited segments/moments, nor do these systems provide for production of 
output signals in a simple, smooth fashion suitable for times and systems in which the input 
video resolution is low. 
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Therefore, present systems employing user video image input for interaction with an 
object on a graphical display are generally unable to offer the simplicity, responsiveness, and 
mass-marketability performance required while providing that an effective level of control is 
maintained over the output graphical display. 



SUMMARY AND OBJECTS OF THE PRESENT INVENTION 



In light of the disadvantages described above w^ith respect to the present state of 
the art of user input employing video images for interaction with a graphical display, it is 
an object of the present invention to provide a user input system and method that afford a 
5 simplicity of design and methodology, yet provide for robust, effective interaction. 

It is a further object of embodiments of the present invention to provide a user 
input system that allows for the calculation of an array of control signals from the initial 
determination of only several segments/moments. 

It is yet a further object of embodiments of the present invention to provide a user 
10 input system that requires computer processing capacity compatible with consumer 
entertainment systems. 

It is yet a further object of embodiments of the present invention to provide a user 
input system without any user encumbrances within the video camera field of view. 

A device and method for effecting movement, responsive to user input, of an 
1 5 object on a graphical display are disclosed. An input device comprises a component for 
capturing video images, an input image processor that processes the captured video 
images and generates an output signal responsive to motion from the video images, and 
an output image processor that is programmed to effect movement of an object on the 
graphical display in response to signals received from the input image video processor. 
20 Various algorithms are employed within the input image processor to determine initial 
and derivative data that effects the movement of the object on the graphical display. In a 
preferred embodiment, video images are captured and processed to isolate a human form 
from a background, arm position and movement data are calculated from the human 
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form, and a signal is generated by the input image processor that is responsive to this data 
for controlling the movement of an object, such as a bird, on a graphical display. The 
movement controlled on the graphical display can take the form of a moving object, or of 
the change of perspective that such an object might undergo (e.g., bird's eye view). 

Other features and advantages of the present invention will be apparent from the 
accompanying drawings and from detailed description that follows. 



BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention is illustrated by way of example and not limitation in the 
figures of the accompanying drawings, in which like references indicate similar elements, 
and in which: 

5 Figure 1 is a block diagram of an exemplary user input system for interaction with 

an object on a graphical display that can be used to implement embodiments of the 
present invention; 

Figure 2 illustrates a user input system for interaction with an object on a 
graphical display, according to one embodiment of the present invention; 
1 0 Figure 3 is an exemplary diagram of a human image showing division into left 

and right arm subimages for determination of arm angles, according to one embodiment 
of the present invention; 

Figure 4 is a flow chart that illustrates the steps of effecting movement, 
responsive to the movement of a user's arms, of an object on a graphical display, 
15 according to one embodiment of the present invention; 

Figure 5 illustrates the correlation between a user's arms and an object on a 
graphical display for a first instance of user position, according to one embodiment of the 
present invention; 

Figure 6 illustrates the correlation between a user's arms and an object on a 
20 graphical display for a second instance of user position, according to one embodiment of 
the present invention; 
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Figure 7 illustrates the correlation between a user's arms and an object on a 
graphical display for a third instance of user position, according to one embodiment of 
the present invention; 

Figure 8 illustrates the correlation between a user's arms and an object on a 
5 graphical display for a fourth instance of user position, according to one embodiment of 
the present invention; 

Figure 9 illustrates the correlation between a user's arms and an object on a 
graphical display for a fifth instance of user position, according to one embodiment of the 
present invention; and 

1 0 Figure 10 is a block diagram of an exemplary processing system that implements 

user input devices and methods according to embodiments of the present invention. 
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DESCRIPTION OF THE PREFERRED EMBODIMENTS 

A device and method for effecting movement, responsive to user input, of an object on 
a graphical display are disclosed. In the foUovdng description, for purposes of explanation, 
numerous specific details are set forth in order to provide a thorough understanding of the 

5 present invention. It vAll be evident, hov^ever, to one of ordinary skill in the art, that the 
present invention may be practiced without these specific details. In other instances, well- 
known structures and devices are shown in block diagram form to facilitate explanation. The 
description of preferred embodiments is not intended to limit the scope of the claims appended 
hereto. 

10 Hardware Overview 

Aspects of the present invention may be implemented by devices capable of 
performing basic video image processing and capable of graphical display. Figure 1 is a block 
diagram of an exemplary user input system for interaction v^th an object on a graphical 
display that can be used to implement embodiments of the present invention. As shown in 

1 5 Figure 1 , user input system 1 00 is comprised of a video capture device 1 02, an input image 
processor 104, an output image processor 106, and a video display device 108. 

The video capture device 102 can be any device capable of capturing sequences of 
video images, and, in the presently preferred embodiment, is a digital video camera (such as a 
"web-cam"), or similar image capturing device. The input image processor 104 translates the 

20 captured video images of human arm motion into signals that are delivered to an output image 
processor. In one embodiment, the input image processor 104 is programmed to: isolate the 
human form from the background in the captured video image, isolate the hxunan arm portions 
from the torso, determine the position and movement of the human arms, and generate an 



-9- 



output signal responsive to the position and/or movement of the human arms. The output 
image processor 106 is programmed to effect translational and/or rotational movement of an 
object on the video display device 108 in response to signals received from the input image 
processor 104. 

5 These and additional aspects of the present invention may be implemented by one or 

more processors which execute software instructions. According to one embodiment of the 
present invention, a single processor executes both input image processing and output image 
processing. However, as shown in the figures and for ease of description, the processing 
operations are shown as being divided between an input image processor 104 and an output 

1 0 image processor 106. It should be noted that the invention is in no way to be interpreted as 
limited to any special processor configuration, such as more than one processor. The multiple 
processing blocks shown in Figure 1 and the other Figures are shown only for convenience of 
description. 

Figure 2 illustrates an input system for user interaction with an object on a graphical 
15 display, according to embodiments of the present invention. Input system environment 200 
includes user 202, video capture device 204, video display device 206, and console 208 
containing the processor functionality, such as a video game machine. User 202 in input 
system environment 200 should be located within the field of view 210 of the video capture 
device 204. This processing system 208 can be implemented by an entertainment system, 
20 such as a SonyC© Playstation"^^ II or Sony® Playstation™ I type of processing and computer 
entertainment system, and such implementation is described in more detail below in the 
Preferred Embodiment section. It should be noted, however, that processing system 208 can 
be implemented in other types of computer systems, such as personal computers, workstations. 
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laptop computers, wireless computing devices, or any other type of computing device that is 
capable of receiving and processing graphical image data. 
Image Processing Method 

The systems of Figures 1 and 2 are implemented along with a method for generating 

5 signals to effect translational and/or rotational movements of an object on a graphical display 
using human arm position and movement data captured by a video device. As the user 202 
moves or positions himself within the field of view 210 of camera 204, corresponding 
movements or positioning of an object or objects on the display device 206 are effected. For 
example, the movement of the user can be used to move a cursor or animated character on the 

1 0 display device relative to a displayed background scene. In a preferred embodiment, the steps 
of this video processing methodology are: (1) subtraction of the background within the field of 
view, (2) determination of field of view object, (3) determination of arm/appendage angle or 
position, and (4) determination of flight parameters. 
Step 1 : Background Subtraction 

1 5 When a person or other object that is used to control graphic display movement is in 

the field of view 210, the image of the person is captured by the digital camera 204 to produce 
pixel data for processing by processor unit 208. In one embodiment of the present invention, 
background subtraction is used to create a per-pixel labeling of the image as either person 
(foreground) or non-person (background). This is accompUshed by storing a frame from the 

20 video sequence when the scene does not include the person. The stored frame is subtracted 
from the live video sequence to create the foreground image using a subtractive filtering 
process. There are several variations on how this subtraction might be accomplished. In one 
embodiment, a simple thresholding scheme is used on the weighted sum of the luminance and 
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chrominance to determine whether a pixel is in the foreground or background. The basic 
process is as follows: first, obtain static background YO UO VO frames; second, smooth 
images YO UO VO using a 5x5 Gaussian convolution; third, obtain current Y U V video 
fi-ames; fourth, smooth images Y U V using a 5x5 Gaussian convolution; fifth, for each pixel 

5 in Y, compute Ydif = abs( Y - YO); sixth, for each pixel in U, compute Udif = abs( U - UO); 
seventh, for each pixel in V, compute Vdif - abs( V - VO); eighth, for each pixel in Ydif Udif 
Vdif, and compute Sum = Ydif + UdiP8 + VdiP8; ninth, for each pixel in Sum, compute 
Foreground = 1 if Sum > Threshold, otherwise, Foreground = 0; and tenth, erode Foreground 
using standard erosion morphological filter (to remove any single-pixel erroneous 

1 0 measvirements, such as caused by salt-and-pepper noise). In general only steps the third 

through tenth steps described above are repeated every video frame. In the above process, Y 
represents the luminance of the pixels, and U and V represent the chrominances of the pixels. 
Step 2: Person in view decision 

The next step is determining whether a person is in the field of view of the video 

1 5 capture device or not. This determines whether or not the video image processing of the user 
will be used to drive the graphical display application. This step consists of counting the total 
number of nonzero pixels in the Foregroimd image, and making sure that the total falls 
between a minimum and a maximum threshold. The minimum threshold serves to ensure that 
there is some difference between the static background and the current video fi-ame. The 

20 maximum threshold serves to ensiire that this difference is not too great; for example, this 
might be caused by a person completely blocking the camera image, or by forgetting to 
initialize the stored background frame. 
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Step 3 : Arm angle determination 

In one embodiment of the present invention, movement or positioning of the displayed 
graphical object on the display is effected by movement or positioning of the in- view person's 
arms and/or legs. This process is generally accomplished by computing area statistics of the 
5 isolated foreground image. Figure 3 is an exemplary diagram of a foreground (human) image 
showing division into left and right arm sub-images for determination of arm angles, 
according to one embodiment of the present invention. Foreground image 302 includes 
human figure torso portion 300, left arm sub-image 306 and right arm sub-image 304. First, 
as seen in Figure 3, the horizontal extent, "W" of the torso is determined by computing the 
10 centroid and second horizontal moment of the nonzero pixels in the Foreground image: 

1 . TorsoStartX = CentroidX - Sqrt(SecondMomentX) 

2. TorsoEndX = CentroidX + Sqrt(SecondMomentX) 

15 

where TorsoStartX is shown as the line running down the left side of torso portion 300, and 
TorsoEndX is shown as the hne running down the right side of torso portion 300. 

Next, the sub-image 306 to the left of TorsoStartX is processed to determine the left 
arm angle. The angle of principal moment of the nonzero pixels in this sub-image is 

20 calculated. This angle can range from 0 to 180 degrees, which represents the allowed range of 
movement of the left arm. The sub-image 304 to the right of TorsoEndX is processed 
identically. The angle of principal moment can be thought of as finding the slope of a line that 
best fits the pixels. Finding the angle of principal moment is a standard technique in image 
processing and dynamical modeling. Such a technique may be found in the textbook 

25 Mechanics of Materials by Timoshenko and Gere, 1997, for example. 
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Step 4: Flight parameter determination 

Characteristics regarding the position of the user, and the rate and type of movement of 
his or her arms are used to effect movement of graphical elements displayed on a graphical 
display. For example, the user's movement can be used to control the motion of a game 
character that is used in a flying game in which the character is seen as flying over a 
landscape. 

With regard to the parameters associated with the flying game, the angles of each 
person's arm relative to his or her torso are processed to compute the airspeed acceleration, 
bank angle, and pitch angle for the flying simulation program. Bank angle is computed from 
the signed difference of the two arm angles. Pitch angle is computed from the average of the 
two arm angles, minus 90 degrees (so that arms straight out represent a zero pitch angle). The 
pitch angle is then scaled down by 0.1 (-90 to 90 pitch would be too large). Airspeed 
acceleration is calculated as the time rate of change of the average of the arm angles (scaled to 
be appropriate). The time rate of change is calculated over several frames to produce a 
smoother signal. 

In general, all of the computed parameters are smoothed before they are used to 
generate graphical display images. This smoothing adds a bit of time lag, but results in a less 
jerky (time delayed) video display. The smoothing is especially desirable when the video 
capture equipment or processing system is capable of only low resolution processing. 

In one embodiment, the calculation of the various parameters to produce graphical 
processing variables for a flight simulator or similar flying program are as follows: 

1 . BankAngle = LeftAngle - RightAngle 

2. PitchAngle = ((( LeftAngle + RightAngle) / 2.0 ) - 90 ) * 0. 1 

3. Accel = abs ( (LeftAngle + RightAngle) - (LeftAnglePrev + PdghtAnglePrev)) * K 
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4. The smoothing is calculated as follows: 

BankAngleSmooth = kl * BankAngle + ( 1 - kl ) * BankAngleSmooth 
PitchAngleSmooth - k2 * PitchAngle + ( 1 - k2 ) * PitchAngleSmooth 
AccelSmooth = k3 * Accel + ( 1 - k3 ) * AccelSmooth 

5 

The smoothed quantities are the ones used for the simulation. The constants kl, k2, 
and k3 specify the response characteristics of the displayed object. This allows the correlation 
of user motion and displayed object motion to be customized for various experiences. For 
example, in a flight simulator program, the constants can be programmed to select or dictate 
10 different characteristics related to different types of planes or flying objects, thus allowing 
different simulation experiences. For example, one setting of the constants can be used to 
simulate a jet feeling, while another setting of the constants can be used to simulate a hang- 
glider feeling. 

Figure 4 is a flowchart that illustrates the steps of effecting movement of an object on a 
1 5 display device responsive to the movement of a user's arms following the methodology 
described above, for one embodiment of the present invention. The first two steps of 
flowchart 400 in Figure 4 comprise the step of background subtraction. In step 404, a 
captured video image that does not include the person is stored, and in step 406, the stored 
video image is subtracted from the live video sequence. Next, it is determined whether a 
20 person is within the field of view of the digital camera, step 408. Step 410 consists of the first 
part of the arm angle determination step, that of determining the horizontal extent of the torso. 
In step 412 each arm angle is determined. This is performed by calculating the angle of 
principle moment of each arm sub-image. 

For the preferred embodiment, the flight parameter determination step is comprised of 
25 the last three steps of Figure 4. First, in step 414, process the arm angles to compute the flight 
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parameters using the same equations as recited above. Second, in step 416, the quantities are 
smoothed with functions including constants chosen for a particular flight experience (e.g., jet, 
glider, etc.). Finally, in step 418, the smoothed parameters are used in the flight simulation to 
effect translational and/or rotational movement of an object on a graphical display. 

According to an altemate embodiment of the present invention, a signal is generated 
for use within any known electrical communication systems, not just merely for effecting 
movement of an object in a graphical display. This embodiment can be described as a method 
for generating signals responsive to human arm position and/or movement data, comprising: 
providing an image processor and a device for capturing video images, capturing video images 
with the device and using the image processor to process those images to isolate a human form 
from a background, isolating the arm portions of the human form from a captured video image 
using the image processor, calculating the arm position and movement data using the image 
processor, and generating a signal responsive to the arm position and movement data using the 
image processor. 

Graphical Input and Processing Functionality 

In the presently preferred embodiment, a user's arm action is captured by the video 
capture device and the corresponding action is translated into the motion of a bird character 
shown on a graphical display. Some representative arm actions and their correlation to bird 
movement are illustrated in Figures 5 through 9. Figure 5 illustrates the correlation between 
the flapping of a user's arms and a bird ascending on the graphical display, according to one 
embodiment of the present invention. As shown in Figure 5, correlation system 500 shows 
user 502, having left arm 504 and right arm 506, flapping his arms, as illustrated by two- 
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directional arrows 508. On the corresponding graphical display 510 and in relation to the 
landscape 516, this action correlates to bird 512 ascending, as illustrated by arrow 514. 

Figure 6 illustrates the correlation between a user maintaining his or her arms straight 
out and a bird soaring on the graphical display, according to one embodiment of the present 
5 invention. As shown in Figure 6, correlation system 600 shows user 602, having left arm 604 
and right arm 606, maintaining his arms straight out, as illustrated by horizontal arrows 608. 
On the corresponding graphical display 610 and in relation to the landscape 616, this action 
correlates to bird 612 soaring, as illustrated by the level flight path. 

Figure 7 illustrates the correlation between a user tilting his or her arms to the left and 

1 0 a bird banking left on the graphical display, according to one embodiment of the present 

invention. As shown in Figure 7, correlation system 700 shows user 702, having left arm 704 
and right arm 706, positioning his arms so the left arm 704 is lower than the right arm 706, as 
illustrated by down arrow 708 and up arrow 710, respectively. On the corresponding 
graphical display 714 and in relation to the landscape 716, this action correlates to bird 712 

1 5 banking left, as illustrated by axes of the bird bodies having a positive slope. 

Figure 8 illustrates the correlation between a user tilting his or her arms to the right and 
a bird banking right on the graphical display, according to one embodiment of the present 
invention. As shown in Figure 8, correlation system 800 shows user 802, having left arm 804 
and right arm 806, positioning his arms so the left arm 804 is higher than the right arm 806, as 

20 illustrated by up arrow 808 and down arrow 810, respectively. On the corresponding 

graphical display 812 and in relation to the landscape 818, this action correlates to bird 814 
banking right, as illustrated by axes of the bird bodies having a negative slope« 
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Figure 9 illustrates the correlation between a user tucking his or her arms back and a 
bird plunging on the graphical display, according to one embodiment of the present invention. 
As shown in Figure 9, correlation system 900 shows user 902, having left arm 904 and right 
arm 906, positioning his arms so as to be tucked back, as illustrated by down arrows 908. On 
the corresponding graphical display 910 and in relation to the landscape 916, this action 
correlates to bird 912 plunging, as illustrated by arrow 914. 

In the preferred embodiment, the user input devices and methods of the present 
mvention are implemented by a computer processing system illustrated by the block diagram 
of Figure 10. The processing system may represent a computer-based entertainment system 
embodiment that includes a central processing unit ("CPU") 1004 coupled to a main memory 
1002 and graphical processing unit ("GPU") 1006. The CPU 1004 is also coupled to an 
Input/Output Processor ("lOP") Bus 1008. In one embodiment, the GPU 1006 includes an 
internal buffer for fast processing of pixel based graphical data. Additionally, the GPU can 
include an output processing portion or functionality to convert the image data processed into 
standard television signals, for example NTSC or PAL, for transmission to a television 
monitor 1007 connected external to the entertainment system 1000 or elements thereof 
Alternatively, data output signals can be provided to a display device other than a television 
monitor, such as a computer monitor, LCD (Liquid Crystal Display) device, or other type of 
display device. 

The lOP bus 1008 couples the CPU 1004 to various input/output devices and other 
busses or device. lOP bus 1008 is connected to input/output processor memory 1010, a 
controller 1012, a memory card 1014, a Universal Serial Bus (USB) port 1016, an IEEE1394 
(also known as a Firewire interface) port, and bus 1030. Bus 1030 couples several other 
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system components to CPU 1004, including operating system ("OS") ROM 1020, flash 
memory 1022, a sound processing unit ("SPU") 1024, an optical disc controlling unit 1026, 
and a hard disk drive ("HDD") 1028. In one aspect of this embodiment, the video capture 
device can be directly connected to the lOP bus 1008 for transmission therethrough to the 
CPU 1004; there, data from the video capture device can be used to change or update the 
values used to generate the graphics images in the GPU 1006. Moreover, embodiments of the 
present invention can use a variety of image processing configurations and techniques, such as 
those described in U.S. Patent Application Serial No. 09/573,105 filed May 17, 2000, and 
entitled OUTLINE GENERATING DATA, GENERATING METHOD AND APPARATUS, 
which is hereby incorporated by reference in its entirety. 

Programs or computer instructions embodying aspects of the present invention can be 
provided by several different methods. For example, the user input method for interaction 
with graphical images can be provided in the form of a program stored in HDD 1028, flash 
memory 1022, OS ROM 1020, or on a memory card 1012. Alternatively, the program can be 
downloaded to the processing unit 1000 through one or more input ports coupled to the CPU 
1004. The program modules defining the input method can be provided with the game or 
application program that is executed by the CPU 1004 and displayed on display device 1007 
or they may be provided separately from the application program, such as for execution from 
local main memory 1002. 

Embodiments of the present invention also contemplate distributed image processing 
configurations. For example, the invention is not limited to the captured image and display 
image processing taking place in one or even two locations, such as in the CPU or in the CPU 
and one other element. For example, the input image processing can just as readily take place 
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in an associated CPU, processor or device that can perform processing; essentially all of image 
processing can be distributed throughout the interconnected system. Thus, the present 
invention is not limited to any specific image processing hardware circuitry and/or software; it 
is also not limited to any specific combination of general hardware chcuitry and/or software, 
5 nor to any particular source for the instructions executed by processing components. 
Other Embodiments 

In one embodiment, a flapping noise can be added to the demonstration. The flapping 
noise is triggered when the signed time rate of change of the average of the arm angles 
exceeds a particular threshold. This leads to a flapping noise only when the user moves his or 
10 her arms dovra together. The volume of the flapping noise is scaled by that behavior as well, 
so a more exaggerated flapping motion produces a louder flapping sound, 

A fiirther embodiment has an addition to the demonstration in which the arm angles 
are used to index into a pre-generated animation of a bird flapping its wings. The arm angles 
are used as indices into this animation. In this embodiment, the demonstration can be ftom a 
1 5 first person perspective, so animation of the bird will not be seen, though its shadow, as cast 
by the sun on the landscape, can be seen. 

In the network demonstration, the actual bird animation corresponding to the motion of 
the other person playing. Several animations of bird variants can exist, such as a dragon, 
eagle, and hawk. The smoothing parameters described above will be set to provide a slightly 
20 different flight experience for each of the animations (e.g., the dragon is big and slow, so the 
glider parameters are used, while the hawk is small and quick, so the fighter-jet parameters are 
used). 
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As can be appreciated by those of ordinary skill in the art, applications of the graphical 
input and processing methods described herein can be extended to other types of simulations 
or programs other than flying games or simulators. For example, the user motion can be used 
to control objects in video game or educational programs that include driving objects, running 
5 objects, or various sports games that involve user motion, such as skiing, bicycling, fighting, 
and similar action. 

Furthermore, although embodiments of the present invention were illustrated and 
described with respect to movement of the user's arms being used to effect movement of a 
displayed object, it should be noted that similar processing techniques could be used to 
1 0 provide graphical control based on movement of the user' s legs or head or objects held by the 
user. 

Embodiments of the present invention allow for the advantageous and convenient 
implementation of various post-processmg filtering and special effects on the displayed object, 
such as shadow effects, distortion, morphing effects, and the like. Such post-processing 

1 5 filtermg and special effects are advantageous additions and can be used to capitalize on the 
simpUcity and efficiency of the fundamental system. 

In the foregoing, a user input system for effecting movement of an object on a 
graphical display has been described. Although the present invention has been described with 
reference to specific exemplary embodiments, it will be evident that various modifications and 

20 changes may be made to these embodiments without departing fi-om the broader spirit and 
scope of the invention as set forth in the claims. For example, embodiments of the invention 
can be extended to applications other than just systems having an output related to flying or 
flying games. In general, any graphical based application in which movement or positioning 
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of a displayed object is controlled by the movement or positioning of a user can be used in 
conjunction with the processing methods described herein. Accordingly, the specification and 
drawings are to be regarded in an illustrative rather than a restrictive sense. 
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CLAIMS 



What is claimed is: 

1 . An input device for providing a signal to effect translational and/or rotational 
5 movements of an object on a graphical display, comprising: 

a device for capturing video images; 

an input image processor that translates captured video images of human arm 
motion into signals that are delivered to an output image processor, the input image 
processor programmed to (a) isolate the human form from the background in the captured 
10 video image; (b) determine the position and movement of the human arms; and (c) 

generate an output signal responsive to the position and/or movement of the human arms; 
and 

an output image processor that is programmed to effect translational and/or 
rotational movement of an object on a graphical display in response to the signals 
1 5 received from the input image processor. 

2. The input device of claim 1 v^herein the output image processor changes the 
graphical display accordhig to the perspective of what a flying object would see. 

20 3. The input device of claim 1 wherein the output image processor generates a 
graphical display of a flying object whose position and motion are responsive to the 
signal output by the input image processor. 
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4. A method for generating signals to effect translational and/or rotational 
movements of an object on a graphical display using human arm position and movement 
data, comprising: 

providing an image processor and a device for capturing video images; 
5 capturing video images and processing those images to isolate a human form from 

a background; 

isolating the arm portions of the human form; 
calculating the arm position and movement data; and 
generating a signal responsive to the arm position and movement data for 
1 0 effecting translational and/or rotational movement of an obj ect on a graphical display, 

5„ A method for generating signals using human arm position and/or movement data, 
comprising: 

providing an image processor and a device for capturing video images; 
1 5 capturing video images with the device and using the image processor to process 

those images to isolate a human form from a background; 

isolating the arm portions of the human form from a captured video image using 
the image processor; 

calculating the arm position and movement data using the image processor; and 
20 generating a signal responsive to the arm position and movement data using the 

image processor. 
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6. A method for generating signals to effect translational and/or rotational 
movements of an object on a graphical display using human arm position and/or 
movement data, comprising: 

providing an image processor and a device for capturing a video sequence; 
5 capturing, from the video sequence, a frame that does not include a person; 

isolating a viev^ comprising a foreground subject image viev^ by performing an 
algorithm on the video sequence and the frame that does not include the person; 

determining whether the isolated viev^ includes the image of a person; 

determining the horizontal extent of the subject's torso so as to isolate the arm 
10 portions of the human form in each captured video frame; 

computing the arm angles by calculating angles of principle moment of the 
nonzero pixels in the arm portions of the video image; and 

generating an arm position data signal responsive to arm angles for effecting the 
translational and/or rotational movement of an object on a graphical display. 

15 

7. The method of claim 6 wherein the step of determining whether the view includes 
a person comprises the steps of: 

counting the total number of nonzero pixels in the foreground image; 

ensuring that the total number of nonzero pixels falls within a range defined by a 
20 minimum and a maximum threshold number of pixels. 
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8. The method of claim 6 wherein the algorithm in the isolating step involves 
subtracting the frame that does not include a person from the individual frames in the 
video sequence. 



5 9, The method of claim 6 wherein the following algorithm is used in the isolating 
step: 

(a) obtain static background YO UO VO frames; 

(b) smooth images YO UO VO using a 5x5 Gaussian convolution; 

(c) obtain current Y U V video frames; 

1 0 (d) smooth images Y U V using a 5x5 Gaussian convolution; 

(e) for each pixel in Y, compute Ydif = abs( Y - YO); 

(f) for each pixel in U, compute Udif = abs( U - UO); 

(g) for each pixel in V, compute Vdif = abs( V - VO); 

(h) for each pixel in Ydif Udif Vdif, compute Sum = Ydif + UdiP8 + VdiPS; 

1 5 (i) for each pixel in Sum, compute Foreground = 1 if Sum > Threshold, 0 otherwise; 

Q) erode Foreground using standard erosion morphological filter (to remove any single-pixel 
erroneous measurements, such as caused by salt-and-pepper noise). 

1 0. The method of claim 6 wherein the arm position/movement data signals generated 
20 in the generating step are selected from the group consisting of signals related to object 
airspeed acceleration, bank angle, and pitch angle. 



1 1 . The method of claims 6 wherein the arm position/movement data signals 
generated in the generating step are determined with the inclusion of smoothing 
25 constants. 



12, A method for generating signals for use in a flight simulator graphical display 
using human arm position data to effect translational and/or rotational movement, 
comprising: 
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providing a device for capturing video images and an image processor; 

captviring video images with the device and using the image processor to process 
those images to isolate a human form from a background; 

isolating the arm portions of the human form from a captured video image using 
5 the image processor; 

calculating arm position and movement data using the image processor; and 

generating a signal responsive to the arm position and movement data using the 
image processor for use in generating the state of a flight simulator graphical display, 

10 13. The method of claim 1 2 wherein the flight simulator graphical display includes as 
an object a flying creature that moves wings. 

14. The method of claim 12 wherein the flight simulator graphical display depicts a 
change in perspective of what a flying creature would see. 

15 

15, The method of claim 1 3 further including the step of generating flapping noises 
corresponding to the movement of the wings of the flying creature. 



16. The method of claim 1 5 wherein the volume of the flapping noises increases with 
20 an increased rate of captured arm motion. 

17. The method of claims 1 5 wherein the flapping noise is triggered when the signed 
time rate of change of the average of the arm angles exceeds a pre-determined threshold. 
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18. An article of manufacture embodying a program of instructions executable by a 
machine, the program of instructions including instructions for: 

capturing video images and processing those images to isolate a human form from 
5 a backgroimd; 

isolating the arm portions of the human form; 
calculating the arm position and movement data; and 
generating a signal responsive to the arm position and movement data for 
effecting translational and/or rotational movement of an object on a graphical display, 

10 

1 9. The article of manufacture of claim 1 8 wherein the signal generated by the 
program of instructions is used to generate the state of a flight simulator graphical 
display. 



15 20. An article of manufacture embodying a program of instructions executable by a 
machine, the program of instructions including instructions for: 

capturing video images with the device and using the image processor to process 
those images to isolate a human form from a background; 

isolating the arm portions of the human form from a captured video image using 
20 the image processor; 

calculating the arm position and movement data using the image processor; and 
generating a signal responsive to the arm position and movement data using the 
image processor. 
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21 . An article of manufacture embodying a program of instructions executable by a 
machine, the program of instructions including instructions for: 

capturing, from the video sequence, a frame that does not include a person; 
5 isolating a view (foreground/subject image) by performing an algorithm on the 

video sequence and the frame that does not include a person; {entire? partial?} 

determining v^hether the isolated view includes the image of a person; 

determining the horizontal extent of the subject's torso so as to isolate the arm 
portions of the human form in a/the/each captured video frame; 
10 computing the arm angles by calculating angles of principle moment of the 

nonzero pixels in the arm portions of the video image; and 

generating an arm position/movement data signal responsive to arm angles for 
effecting the translational and/or rotational movement of an object on a graphical display. 
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ABSTRACT OF THE INVENTION 



A device and method for effecting movement, responsive to user input, of an 
5 object on a graphical display are disclosed. An input device comprises a component for 
capturing video images, an input image processor that generates an output signal 
responsive to motion from the video images, and an output image processor that is 
programmed to effect movement of the object on the graphical display in response to 
signals received from the input image processor. Various algorithms are employed 
1 0 within the input image processor to determine initial and derivative data that controls the 
movement of the object on the graphical display. In a preferred embodiment, video 
images are captured and processed to isolate a human form from a background, arm 
position and movement data are calculated from the human form, and a signal is 
generated responsive to this data for controlling the movement of an object, such as a 
1 5 bird, on a graphical display. The movement controlled on the graphical display can take 
the form of a moving object, or of the change of perspective that such an object might 
undergo, for example, a bird's eye view. 



20 
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